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Technique for automatically correcting words in text 
Karen Kukich 

December 1992 ACM Computing Surveys (CSUR), volume 24 issue 4 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms, review 



Full text available: W\ pdf(6.23 MB) 



Research aimed at correcting words in text has focused on three progressively more 
difficult problems:(l) nonword error detection; (2) isolated-word error correction; and (3) 
context-dependent work correction. In response to the first problem, efficient pattern- 
matching and n-gram analysis techniques have been developed for detecting strings that 
do not appear in a given word list. In response to the second problem, a variety of general 
and application-specific spelling cor ... 

Keywords: n-gram analysis. Optical Character Recognition (OCR), context-dependent 
spelling correction, grammar checl<ing, natural-language-processing models, neural net 
classifiers, spell checking, spelling error detection, spelling error patterns, statistical- 
language models, word recognition and correction 



2 Generic text summarization using relevance measure and latent semantic analysis 
Yihong Gong, Xin Liu 

September 2001 Proceedings of the 24th annual international ACM SIGIR conference 

on Research and development in information retrieval 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: 'p^ pdf(173.90 KB) 



In this paper, we propose two generic text summarization methods that create text 
summaries by ranking and extracting sentences from the original documents. The first 
method uses standard IR methods to rank sentence relevances, while the second method 
uses the latent semantic analysis technique to identify semantically important sentences, 
for summary creations. Both methods strive to select sentences that are highly ranked 
and different from each other. This Is an attempt to create a summa ... 

Keywords: generic text summarization, relevance measure, semantic analysis 



Summarization: Web-page summarization using clickthrouah data 

Jian-Tao Sun, Dou Shen, Hua-]un Zeng, Qiang Yang, Yuchang Lu, Zheng Chen 

August 2005 Proceedings of the 28th annual international ACM SIGIR conference on 



http://portal.acm. org/resultsxfm?coll=ACM&dl==ACM&CFID=64495999&CFTOKEN=6... 1/4/06 



Results (page 1 ): document summary using sentence term vector frequency 



Page 2 of 6 



Research and development in information retrieval SIGIR '05 
Publisher: ACM Press 

Full text available: gpdf( 157.70 KB) Additional Information: full citation , abstract , references , index terms 

Most previous Web-page summarization methods treat a Web page as plain text. 
However, such methods fail to uncover the full knowledge associated with a Web page 
needed in building a high-quality summary, because many of these methods do not 
consider the hidden relationships in the Web. Uncovering the hidden knowledge is 
important in building good Web-page summarizers. In this paper, we extract the extra 
knowledge from the clickthrough data of a Web search engine to improve Web-page 
summarization ... 

Keywords: clickthrough data, generic web-page summarization, latent semantic analysis, 
thematic lexicon 



Automatic phrase indexing for document retrieval 
J. Fagan 

November 1987 Proceedings of the 10th annual international ACM SIGIR conference 
on Research and development in information retrieval 

Publisher: ACIVI Press 

Full text available- -^ 1 pdfd.lO MB) Additional Information: full citation , attract, references , citings, index 
^^^^^^^^"^ terms 

An automatic phrase indexing method based on the term discrimination model is 
described, and the results of retrieval experiments on five document collections are 
presented. Problems related to this non-syntactic phrase construction method are 
discussed, and some possible solutions are proposed that make use of information about 
the syntactic structure of document and query texts. 

5 Summarization: Topic themes for multi-document summarization 
Sanda Harabagiu, Finley Lacatusu 

August 2005 Proceedings of the 28th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '05 

Publisher: ACM Press 

Full text available: '^ pdf(245.85 KB) Additional Information: full citation , abstract , references , index terms 

The problem of using topic representations for multi-document summarization (MDS) has 
received considerable attention recently. In this paper, we describe five different topic 
representations and introduce a novel representation of topics based on topic themes. We 
present eight different methods of generating MDS and evaluate each of these methods 
on a large set of topics used in past DUG workshops. Our evaluation results show a 
significant improvement in the quality of summaries based on topic ... 

Keywords: summarization, topic themes 



6 Information access and retrieval: Multiple related document summary and navigation ^ 
A, using concept hierarchies for mobile clients 
^ D. L Chan, R. W. P. Luk, W. K. Mak, H. V. Leong, E. K. S. Ho, Q. Lu 

March 2002 Proceedings of the 2002 ACM symposium on Applied computing 

Publisher: ACM Press 

Full text available: 'g| pdf(660.36 KB) Additional Information: full citation , abstract , references , index terms 

Mobile clients have limited display and navigation capabilities. To browse a set of 
documents, an intuitive method is to navigate through concept hierarchies. To reduce 
semantic loading for each term that represents the concepts and the cognitive loading of 
users due to the limited display, similar documents are grouped together before concept 
hierarchies are constructed for each document group. Since the concept hierarchies only 
represent the salient concepts in the documents, term extraction i ... 
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7 Entity-based cross-document coreferencing using the Vector Space Model Q 
Amit Bagga, Breck Baldwin 

August 1998 Proceedings of the 17th international conference on Computational 
linguistics - Volume 1 , Proceedings of the 36th annual meeting on 
Association for Computational Linguistics - Volume 1 

Publisher: Association for Computational Linguistics , Association for Computational Linguistics 

Full text available: W\ pdf(593.52 KB) 

Additional Information: full citation , abstract , referenc es, dtinos 

W Publisher Site 

Cross-document coreference occurs when the same person, place, event, or concept is 
discussed in more than one text source. Computer recognition of this phenomenon is 
important because it helps break "the document boundary" by allowing a user to examine 
information about a particular entity from multiple text sources at the same time. In this 
paper we describe a cross-document coreference resolution algorithm which uses the 
Vector Space l^odel to resolve ambiguities between people having the same ... 

8 Text classification: Web-page classification through summarization B 
jif:. Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Benyu Zhang, Yuchang Lu, Wei-Ying l^a 

July 2004 Proceedings of the 27th annual international ACM SIGIR conference on 

Research and development In information retrieval SIGIR '04 
Publisher: ACM Press 

.- .. X ^ -■ ui iS% -j*/ooc on txD\ Additional Information: full citation , abstract , references , index terms . 

Full text available: pdf(225.80 KB) — : 

^■^^ review 

Web-page classification is much more difficult than pure-text classification due to a large 
variety of noisy information embedded in Web pages. In this paper, we propose a new 
Web-page classification algorithm based on Web summarization for improving the 
accuracy. We first give empirical evidence that ideal Web-page summaries generated by 
human editors can indeed improve the performance of Web-page classification algorithms. 
We then propose a new Web summarization-based classification algorithm ... 

Keywords: content body, web page categorization, web page summarization 



9 Summarizing text docunnents: sentence selection and evaluation nnetrics 
0^ Jade Goldstein, Mark Kantrowitz, Vibhu Mittal, Jaime Carbonell 

^ August 1999 Proceedings of the 22nd annual international ACM SIGIR conference on 
Research and development in information retrieval 
Publisher: ACM Press 

Full text available: ^ pdf(273.01 KB) Additional Information: full citation , references , citings , index terms 



10 Set-based vector nnodel: An efficient approach for correlation-based ranking 
^ Bruno Possas, Nivio Ziviani, Wagner Meira, Berthier Ribeiro-Neto 
^ October 2005 ACM Transactions on Information Systems (TOIS), Volume 23 issue 4 

Publisher: ACM Press 

Full text available: ^pdf(800.89 KB) Additional Information: full citation , abstract , references , index terms 

This work presents a new approach for ranking documents in the vector space model. The 
novelty lies in two fronts. First, patterns of term co-occurrence are taken into account and 
are processed efficiently. Second, term weights are generated using a data mining 
technique called association rules. This leads to a new ranking mechanism called the set- 
based vector model. The components of our model are no longer Index terms but index 
termsets, where a termset is a set of index terms. Termset ... 

Keywords: Information retrieval models, association rule mining, correlation-based 
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11 Using syntactic analysis in a document retrieval system that uses signature files Q 
Ji^^ R. Sacks-Davis, P. Wallis, R. Wilkinson 

^ December 1989 Proceedings of the 13th annual international ACM SIGIR conference 
on Research and development in information retrieval 
Publisher: ACM Press 

... ^ -. ... isi^ ^iiAAHKAr^s Additional Information: full citation , abstract , references , dtinqs . index 
Full text available: "pI pdfM.IS MB) 

terms 

Our work involves the study of the extent to wliich natural language processing 
techniques aid the autonnatic indexing and retrieval of documents. In this paper we 
describe the use of signature files in large text retrieval systenns. We show that good 
performance can be obtained without requiring the significant overheads required for the 
inverted file technique. We examine the use of syntactic analysis of the text in all stages 
of retrieval and argue that an initial Boolean query should be pe ... 

12 Multidocument summarization: An added value to clustering in interactive retrieval 
Manuel J. Mana-Lopez, Manuel De Buenaga, Jose M. Gomez-Hidalgo 

^ April 2004 ACM Transactions on Information Systems (TOIS), Volume 22 issue 2 

Publislier: ACIVI Press 

.- n* ^ I ui tfsii A*iAc^r^r^^ Additional Information: full citation , abstract , references , index terms . 

Full text available: t?:1 pdf(199.91 KB) — \ 

^'^^^^^-^ review 

A more and more generalized problem in effective information access is the presence in 
the same corpus of multiple documents that contain similar information. Generally, users 
may be interested in locating, for a topic addressed by a group of similar documents, one 
or several particular aspects. This kind of task, called instance or aspectual retrieval, has 
been explored in several TREC Interactive Tracks. In this article, we propose in addition to 
the classification capacity of clustering techn ... 

Keywords: Multidocument summarization, topic segmentation 



13 Extracting sentence segments for text summarization: a machine learning approach 
Jk^ Wesley T. Chuang, Jihoon Yang 

^ July 2000 Proceedings of the 23rd annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

I- ^ -I u. fii^ j«n>icocixD\ Additional Information: full citation , abstract , references , citings , index 

Full text available: W\ pdf( 945.26 KB) ^ 

i^^^J"^ terms 

With the proliferation of the Internet and the huge amount of data it transfers, text 
sumnnarization is becoming more important. We present an approach to the design of an 
automatic text summarizer that generates a summary by extracting sentence segments. 
First, sentences are broken into segments by special cue markers. Each segment is 
represented by a set of predefined features (e.g. location of the segment, average term 
frequencies of the words occurring in the segment, number of title words ... 

Keywords: machine learning, sentence segment extraction, text summarization 



14 Query expansion using lexical-semantic relations 
Ellen M. Voorhees 

August 1994 Proceedings of the 17th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: Springer-Verlag New York, Inc. 

Full text available: pdf(692.34 KB) Additional Information: full citation , references , citings , index terms 
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15 Paper session IR-10 (information retrieval): query expansion: Query expansion using g 

term relationships in language models for information retrieval 
^ jing Bai, Dawei Song, Peter Bruza, Jian-Yun Nie, Guihong Cao 

October 2005 Proceedings of the 14th ACM international conference on Information 
and knowledge management CIKM '05 

Publisher: ACM Press 

Full text available: ^pdf(428.57 KB) Additional Information: full citation , abstract , references , index terms 

Language Modeling (LM) has been successfully applied to Information Retrieval (IR). 
However, most of the existing LM approaches only rely on term occurrences in 
documents, queries and document collections. In traditional unigram based models, terms 
(or words) are usually considered to be independent. In some recent studies, dependence 
models have been proposed to incorporate term relationships into LM, so that links can be 
created between words in the same sentence, and term relationships (e.g. ... 

Keywords: information flow, language model, query expansion, term relationships 



16 Ternn clustering of syntactic phrases 
D. D. Lewis, W. B. Croft 

December 1989 Proceedings of the 13th annual international ACM SIGIR conference 
on Research and development in information retrieval 

Publisher: ACM Press 

r- II* ^ ui 1^ £?nii/iDv Additional Information: full citation , abstract , references , citings , index 

Full text available: 'pl pdf(1.62 MB) ; 

y^^-J-"^ terms 

Term clustering and syntactic phrase formation are methods for transforming natural 
language text. Both have had only mixed success as strategies for improving the quality 
of text representations for document retrieval. Since the strengths of these methods are 
complementary, we have explored combining them to produce superior representations. 
In this paper we discuss our implementation of a syntactic phrase generator, as well as 
our preliminary experiments with producing phrase clusters. Th ... 

17 TextTiling: segmenting text into multi-paragraph subtopic passages 
Marti A. Hearst 

March 1997 Computational Linguistics, volume 23 issue i 
Publisher: MIT Press 

Full text available: ™ S 

']^pclf(2.46MB)^0' Additional Information: full citation , abstract , references , citings 

Publisher Site 

TextTiling is a technique for subdividing texts into multi-paragraph units that represent 
passages, or subtopics. The discourse cues for identifying major subtopic shifts are 
patterns of lexical co-occurrence and distribution. The algorithm is fully implemented and 
is shown to produce segmentation that corresponds well to human judgments of the 
subtopic boundaries of 12 texts. Multi-paragraph subtopic segmentation should be useful 
for many text analysis tasks, Including information retrieval and ... 

18 Text Categorization: Topic difference factor extraction between two document sets 

^ and its application to text categorization 
^ Takahiko Kawatani 

August 2002 Proceedings of the 25th annual international ACM SIGIR conference on 

Research and development in information retrieval 
Publisher: ACM Press 

Full text available: '^ pdff249.49 KB) Additional Information: full citation , abstract , references , index terms 

To improve performance in text categorization, it is important to extract distinctive 
features for each class. This paper proposes topic difference factor analysis (TDFA) as a 
method to extract projection axes that reflect topic differences between two document 
sets. Suppose all sentence vectors that compose each document are projected onto 



http://portal.acm.org/resultsxfm?coll=ACM&dl=ACM&CFID=64495999&CFTOKEN^ 1/4/06 



Results (page 1 ): document summary using sentence term vector frequency Page 6 of 6 

projection axes. TDFA obtains the axes that maximize the ratio between the document 
sets as to the sum of squared projections by solving a generalized eigenv ... 

19 Fast and effective text mining using linear-time document clustering ^ 
Bjornar Larsen, Chinatsu Aone 

August 1999 Proceedings of the fifth ACM SIGKDD international conference on 

Knowledge discovery and data mining 
Publisher: ACM Press 

Full text available: "PI pdf(897.82 KB) Additional Information: full citation , references , dtinas . index terms 
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20 Building efficient and effective metasearch engines 
Weiyi Meng, Clement Yu, King-Lup Liu 

March 2002 ACM Computing Surveys (CSUR), Volume 34 issue i 
Publisher: ACM Press 

. ^ -. 1.. iffii^ ^^iAAc, n7 Additional Information: full citation , abstract , references , citings , index 
Full text available: pdf(416.07 KB) 
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Frequently a user's information needs are stored in the databases of multiple search 
engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search 
engines and identify useful documents from the returned results. To support unified 
access to multiple search engines, a metasearch engine can be constructed. When a 
metasearch engine receives a query from a user, it invokes the underlying search engines 
to retrieve useful information for the user. Metasearch engines have ... 

Keywords: Collection fusion, distributed collection, distributed information retrieval, 
information resource discovery, metasearch 
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